-
Notifications
You must be signed in to change notification settings - Fork 448
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes suggested inline, but otherwise I'm happy with this 👍
It'd be good to rebase this on main
to make sure that we didn't miss any other deprecated APIs from #364.
Once this is rebased and the comments addressed I'll start testing.
00aa6a0
to
5e88db0
Compare
This pull request has been transformed into a single commit that has no conflicts with the main branch. |
5e88db0
to
5912195
Compare
Rebased to pull in recent fixes in |
gpuCI: NVIDIA/thrust#1593 |
There were a few issues in CI: NVIDIA/thrust#1593 From the clang-9 build, I didn't check the others: Missing
Using
|
5912195
to
c814fa5
Compare
Fixed both and rebased. |
List of individual changes: - Fixed test errors - OffsetT == unsigned long long for the 64-bit case - using std::{is_same,conditional} - using "portion" consistently for 2^28-2^30-sized chunks of the input array - HasEnoughMemory() takes overwrite into account. - moved checking for enough memory earlier. - added a CTA_SYNC() to the histogram kernel - disabled tests with NumItemsT != int for segmented sort - testing with 4.5 bln. items - tests for different NumItemsT - NumItemsT for all device sorting functions - wrapped ChooseOffsetT into namespace detail - fixed typos - templatized the type of num_items in 2 methods of DeviceRadixSort - tuned radix sort with 64-bit OffsetT for V100 - tuned for 64-bit OffsetT for A100 - separate tuning parameters for 64-bit OffsetT - improved downsweep policy for GP100 - option for 64-bit num_items with 32-bit shared memory histogram counters. - introduced PartOffsetT into Onesweep kernel. - OffsetT is now only used for offsets into the whole array (e.g. bin counts or global read/write offsets) - PartOffsetT is used for offsets that do not exceed a single part (e.g. decoupled look-back, block index, number of items inside a part) - this fixes problems when OffsetT is unsigned, and also contributes towards supporting 64-bit num_items
c814fa5
to
178bbaa
Compare
Did one more rebase to bring in some CI fixes -- gpuCI should not have any failures if all goes well. gpuCI: NVIDIA/thrust#1593 |
main should be clear now. Trying again. gpuCI: NVIDIA/thrust#1593 |
All set, thanks again @canonizer! |
OffsetT
is supported for onesweep sortingOffsetT
-sized counters in global memory